Classic Asynchronous Workflow Example

This section describes the most common usage of the asynchronous endpoints.

The purpose of this flow is:

Create a collection with multiple requests.
Execute a Run for this collection.
Query the run status until it completes.

1. Create a collection

First, we create a collection that groups the requests we want to run asynchronously.

Endpoint

POST /v1/async/collections

Request Body Example

{
  "name": "new collection",
  "requests": [
    {
      "url": "www.google.com",
      "browser": true,
      "screenshot": false,
      "actions": [
        {
          "type": "wait-for-timeout",
          "time": 5000
        }
      ]
    },
    {
      "url": "www.example.com",
      "browser": true,
      "screenshot": false,
      "actions": [
        {
          "type": "wait-for-timeout",
          "time": 5000
        }
      ]
    }
  ]
}

Expected Response

{
  "id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
  "name": "new collection",
  "message": "Collection created successfully."
}

At this point, the collection is ready to be executed. Save the collection_id as it will be needed for the following steps.

2. Create a Run for the Collection

Once the collection has been created, we can start the Run execution.

A Run represents a single execution of the requests placed in the collection.

Endpoint

POST /v1/async/collections/{collection_id}/run

Parameters

No body is needed for this request, only the collection_id is required as a parameter.

Response Example

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "in_progress",
  "total_requests": 2,
  "success_requests": 0,
  "failed_requests": 0,
  "timeout_requests": 0,
  "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"
}

The run is created and begins executing asynchronously.

The initial status always starts as in_progress.
The run_id uniquely identifies the execution and should be saved to track the Run.

3. Query the Run Status

Since the run is asynchronous, the execution takes a variable amount of time depending on the number of requests and their complexity.

After a short wait, you can query the run status using the run_id.

Endpoint

GET /v1/async/collections/{collection_id}/runs/{run_id}

Response Example (Still In Progress)

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "in_progress",
  "total_requests": 2,
  "success_requests": 1,
  "failed_requests": 0,
  "timeout_requests": 0,
  "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"
}

This response indicates that the Run has started but has not finished.

4. Run Completed

After waiting long enough, the query will return the completed run.

Response Example (Completed)

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "completed",
  "total_requests": 2,
  "success_requests": 2,
  "failed_requests": 0,
  "timeout_requests": 0,
  "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"
}

At this point:

status is completed.
All requests defined in the collection have been processed.
success_requests counts jobs that returned usable content (HTTP 2xx + no captcha/block signal). failed_requests includes worker failures and jobs that completed but whose target returned 4xx/5xx or a block page. timeout_requests covers jobs that exceeded the worker-level timeout. The invariant total_requests = success + failed + timeout always holds on a completed run.

Retrieving per-job results

The run-status endpoint above is a summary. To iterate each job's URL, custom_id, timings, and HTML, use the jobs listing endpoint with cursor pagination:

cursor = None
while True:
    params = {"limit": 500, "order_by": "completed_at", "status_filter": "completed,failed,timeout"}
    if cursor:
        params["cursor"] = cursor
    page = requests.get(
        f"{BASE_URL}/v1/async/collections/{COLLECTION_ID}/runs/{run_id}/jobs",
        headers=HEADERS, params=params,
    ).json()
    for job in page["items"]:
        handle(job["custom_id"], job["url"], job["status"], job["status_code"])
    if not page.get("has_more"):
        break
    cursor = page["cursor_next"]

With order_by=completed_at + since_completed_at you can stream completions incrementally without re-paginating the whole run on each poll. See the API reference for the full semantics.

For the full HTML or extracted data of a specific job: GET /v1/async/collections/{cid}/runs/{run_id}/jobs/{job_id}/result. HTML bodies are retained 48 hours after completion; metadata (status, timings, URL, custom_id) is retained 90 days in the listing endpoint.

Summary

This code flow follows a simple pattern that breaks down into 3 parts:

Create a Collection With one or more requests (each may include an optional custom_id for traceability).
Start the Execution Run the collection asynchronously.
Query the Run Status Using the run ID until it completes — then iterate the jobs listing to process each result.

Resilient submits (recommended for production)

Two small additions make the workflow safe against transient failures:

Generate an `Idempotency-Key` per submit

Pass a UUID in the Idempotency-Key header on POST /v1/async/collections. If the response is lost to a network timeout, a retry with the same key within 24 h returns the original collection without creating a duplicate (and without a second charge):

import uuid, requests

key = str(uuid.uuid4())
resp = requests.post(
    f"{BASE_URL}/v1/async/collections",
    headers={**HEADERS, "Idempotency-Key": key},
    json={"name": "daily-2026-04-30", "requests": [...]},
)
# Safe to retry resp on timeout — same key + same body returns the same collection.

Reattach to a live run via `GET /collections/{cid}/runs`

If the response to POST /run is lost, you don't need to retry the run (which would queue a duplicate). Look it up by collection — the run_id is already created server-side:

runs = requests.get(
    f"{BASE_URL}/v1/async/collections/{collection_id}/runs?status_filter=in_progress",
    headers=HEADERS,
).json()
if runs["total"] > 0:
    run_id = runs["items"][0]["run_id"]   # reattach
else:
    run_id = requests.post(
        f"{BASE_URL}/v1/async/collections/{collection_id}/run",
        headers=HEADERS,
    ).json()["run_id"]

These two patterns combined remove the most common production failure mode: doubled batches caused by client-side retry of an already-successful request.

1. Create a collection​

Endpoint​

Request Body Example​

Expected Response​

2. Create a Run for the Collection​

Endpoint​

Parameters​

Response Example​

3. Query the Run Status​

Endpoint​

Response Example (Still In Progress)​

4. Run Completed​

Response Example (Completed)​

Retrieving per-job results​

Summary​

Resilient submits (recommended for production)​

Generate an Idempotency-Key per submit​

Reattach to a live run via GET /collections/{cid}/runs​

1. Create a collection

Endpoint

Request Body Example

Expected Response

2. Create a Run for the Collection

Endpoint

Parameters

Response Example

3. Query the Run Status

Endpoint

Response Example (Still In Progress)

4. Run Completed

Response Example (Completed)

Retrieving per-job results

Summary

Resilient submits (recommended for production)

Generate an `Idempotency-Key` per submit

Reattach to a live run via `GET /collections/{cid}/runs`